Electrostatic theory of viral self-assembly: a toy model 
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Viruses self-assemble from identical capsid proteins and their genome consisting, for example, of a 
long single stranded (ss) RNA. For a big class of T = 3 viruses capsid proteins have long positive N- 
terminal tails. We explore the role played by the Coulomb interaction between the brush of positive 
N-terminal tails rooted at the inner surface of the capsid and the negative ss RNA molecule. We 
show that viruses are most stable when the total contour length of ss RNA is close to the total 
length of the tails. For such a structure the absolute value of the total RNA charge is approximately 
twice larger than the charge of the capsid. This conclusion agrees with structural data. 



Unlike living cells, viruses do not have any metabolic 
activity, which may mean that they are in the state of 
thermal equilibrium. This is one of the reasons why 
the statistical physics can be used for understanding of 
viruses. The structure of viruses is also dramatically 
simple. Inside the protein capsid each virus carries its 
genome, which consists of one or more DNA or RNA 
molecules and is used for reproduction in host cells. The 
focus of this letter is on viruses with single stranded RNA 
(ss RNA) genomes. Detailed image reconstruction of ap- 
parently spherical viruses reveals their icosahedral sym- 
metry. This is why such a virus capsid can be viewed as 
a curved two-dimensional crystal closed on itself 0, 0, 01 • 

Here we concentrate on the viruses of the so called T 
= 3 class, in which a capsid is made of precisely 180 
identical proteins, or of 60 triangular blocks consisting 
of three proteins each (see Fig. [I]). In- vitro studies 
of solutions of capsid proteins and RNA molecules of a 
given virus show that under the biological pH and salin- 
ity they can spontaneously self-assemble into infectious 
viruses [1, [H, [g] . This letter focuses on the energetics of 
this amazing protein-RNA self-assembly. In addition to 
hydrophobic attraction between the proteins it is driven 
by strong Coulomb attraction between capsid proteins 
and RNA molecules [H, 0]. Indeed, ss RNA is strongly 
negatively charged. Its backbone has one negative phos- 
phate per nucleotide or per 0.65 nm. We denote the total 
ss RNA charge of a virus particle as — Q r . According to 
Tab. U of T = 3 viruses Q r is about several thousand 
in units of the proton charge. On the other hand, for 
many viruses their capsid proteins carry substantial net 
positive charge q p , which can reach 17. The net positive 
charge of the capsid of a T = 3 virus Q c = 180q p can, 
therefore, reach 3000. Although, in biological conditions 
the protcin-RNA interaction is screened by monovalent 
salt at the Debye-Huckel screening radius r s , attraction 
energy of such big charges is still very large. 

A dramatic feature of the group A of T = 3 viruses col- 
lected in the upper part of Tab. |T] is that almost all the 
capsid protein charge is concentrated in the N-tcrminal 
tail located inside the capsid (Fig. Q]). We define such an 
N-terminal tail as the flexible sequence of amino acids, 
which starts from the N-terminus of the protein and ends 




FIG. 1: (color online) Schematic sketch of the protein capsid 
assembly, (a) Triangular block made of three proteins (blue) 
with their positive flexible N-terminal tails (red), (b) The 
brush of positive N-terminal tails rooted at the inner surface 
of the capsid made of triangular blocks. The ss RNA (green) 
strongly interacts with the tails and keeps all the blocks to- 
gether. 



at the first ev-helix or /3-sheet. It looks like evolution cre- 
ated cationic N-terminal tails for the strong interaction 
with ss RNA genome (Fig. [T]d). 

In this letter we concentrate on the electrostatic inter- 
action of the ss RNA with the brush of tails of a group A 
virus (see Fig. [T)d). In particular we want to understand 
a remarkable fact that for these viruses the absolute value 
of the ss RNA charge Q r is substantially larger than the 
total charge of the capsid Q c = 180q p . The charge in- 
version ratios R — Q r /Q c for them are given in Tab. |T] 
They are scattered with the median value 1.8. This raises 
a challenging question whether such ratio can be obtained 
by minimizing free energy of the virus |9|] with respect to 
RNA length. The positive answer to this question was 
recently given in the framework of the simplest model 
where positive protein charges are uniformly smeared on 
the internal surface of the capsid, while the ss RNA is 
adsorbed on this surface as a negative polyelectrolyte 0] . 
As we see from Tab. U capsid charges of all the group A 
viruses are concentrated in the tails. That is why we sug- 
gest an alternative model of virus self-assembly, namely 
adsorption of ss RNA on a brush of flexible positive tails, 
rooted on a neutral surface. Minimizing the free energy 
of such self-assembly with respect to the total ss RNA 
length we arrive at the theoretical charge inversion ra- 
tios !R, which are quite close to the the factual ones R. 
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TABLE I: The absolute value of ss RNA charge Q r , the charge 
of the capsid protein q v , the N-terminal tail charge q t , the 
number of amino acids in the tail Nt, the ratio of the linear 
charge densities (in fully stretched state) of the ss RNA r/ r 
and the tail r\t, the ratio Nd/Nt, where Nd is number of amino 
acids in disordered part of the tail, the actual and predicted 
charge inversion ratios R and 5R. The data are obtained from 
Refs. 0, [1. In the group A most of the capsid charges are 
concentrated in the tails. In the group B the protein charges 
are large but the tails are practically neutral. In the group C 
the charges of both capsid proteins and tails are very small. 
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We call our model a toy model because we start from 
the following two simplifications, (i) First, similar to 
Ref. Q we neglect hydrogen bonds between ss RNA bases 
which lead to the secondary structure of ss RNA. (ii) 
Second, we assume that each tail is free (does not stick 
to the capsid surface). Actually for some tails, their part 
close to the tail root sticks to the capsid surface [lOj. 
Only this part of the N-terminal tail is seen in the X-ray 
images of the crystallized viruses, while the rest of the 
tail is missing. Missing part of the tail strongly fluctuates 
and is called disordered. We call N d the average number 
of amino acids in the disordered (free) part of the tail. 
Ratios Nd/N t are given in Tab. [B We see that on average 
76% of the tail length is free. In our toy model we assume 
that N d = N t . 
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FIG. 2: (color online) Complexes of the long ss RNA (green) 
with a cationic tail (red) rooted on the internal surface of 
capsid (blue). L is the length of the tail, X is the length 
of RNA piece, which complexes with the tail. The structure 
and the magnitude of X depends on the ratio between the 
charge densities of the tail and the ss RNA. (a) X > L, when 
rj r < 2rjt\ (b) X = L, when r) r > 2rj t . 



Let us first consider interaction of a homo-polymeric ss 
RNA with a single free cationic N-terminal tail rooted at 
the neutral internal surface of the capsid (Fig. [5Ji). We 
assume that in fully stretched state each tail has length 
L and the positive linear charge density r] t , while the 
very long ss RNA in fully stretched state has the neg- 
ative linear charge density — r/,,. The ss RNA piece of 
the length X > L complexes with the tail. Both poly- 
mers are modelled as worm-like chains with the same ra- 
dius b, which is simultaneously of the order of their bare 
persistence length po (which does not include Coulomb 
self- repulsion). The third important assumption of our 
toy model is that (iii) the solution has a moderate salt 
concentration, so that b <C r s «C L. We argue below 
that even this assumption does not change our results 
qualitatively. 

Due to the strong Coulomb repulsion inside the over- 
charged complex, the strongly negatively charged ss RNA 
has a relatively large persistence length p ~ r^/po (see 
13(), so that its Coulomb energy can be es- 
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timated as the energy of a rigid cylinder of the radius b. 
Same is true for the complex of the N-terminal tail and 
the ss RNA, which as we will see has the large negative 
linear charge density r/* . Self-repulsion of these negative 
charges makes the complex locally stretched, so that its 
total length equals L. Therefore, rj* — (~Xri r + Lr]t)/L. 
The tail-RNA complex with the long ss RNA shown in 
the Fig. [2^ has the large electrostatic energy. Therefore, 
the contribution to the free energy F from configurational 
entropy plays a minor role and can be neglected. Since 
r s *C L < X, the Coulomb interaction is truncated at r s . 
As a result, we obtain the following simple expression for 
the X-dependent part of the free energy 



F(X) = L 



-Xr\ r + Lx] t 
L 



The first term represents the self-energy of the over- 
charged N-terminal tail (the complex), while the sec- 
ond term represents the loss of the electrostatic energy 
of the ss RNA segment with length X. Here we ne- 
glect the Coulomb repulsion between the complex and 
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the rest of the ss RNA because r s <C L, X. Mini- 
mizing F{X) with respect to X, we find the optimal 
X = X Q = (rjt/rjr + 1/2)L, and the linear charge den- 
sity of the complex ij* — ~r\ r j2 [l4|. As we expected, 
i]* is negative, so the N-terminal tail is overcharged by 
the ss RNA. The above calculation is valid if ss RNA 
wraps around the tail (Fig. [2^) and, therefore, Ao > L. 
This happens only at rj r /i] t < 2. On the other hand at 
Vr/vt — 2, the length of the ss RNA segment in the com- 
plex, Ao reaches the minimum possible value Ao = L 
corresponding to stretched ss RNA. At rj r /rjt > 2 both 
polymers are stretched (Fig. (2)d) by the Coulomb self- 
repulsion, Ao = L, and 77* = rjt — r/ r < —rj t . Thus, at 
fir I fit > 2 the tail is overcharged by ss RNA more than 
twice. 

Until now we assumed that the ss RNA length Jzf is 
always larger than A , so that A does not depend on Jzf . 
Let us now imagine that we vary Jzf at fixed L, rj t and r\ r . 
Then for a short ss RNA, Jzf < Xq, (where Ao is still the 
optimum value of A found above) the new optimum value 
of X = Xoo equals Jzf (the N-terminal tail consumes 
all available ss RNA). This means that at Jzf < Ao the 
electrostatic energy decreases with growing Jzf, while for 
Jzf > A the energy saturates. Thus, complex of ss RNA 
with an N-terminal tail is most stable if Jzf > Ao. 

Now we can switch from a single N-terminal tail to 
the whole brush of 180 tails and a very long ss RNA 
with the length Jzf comparable to 180L. The average 
distance a between two neighboring tail roots (see Fig. 
[TL) is typically close to 5 nm. We deal with r s much 
smaller than a, so that complexes of the nearest neighbor 
tails with RNA can be treated separately. This means 
that long enough ss RNA goes from one tail to another 
consequently overcharging each of them in the way we 
calculated above for a single tail (Fig. [lb). 

It is easy to show that if Jzf < I8OX0 ss RNA is shared 
between tails in equal portions Jzf/ 180 < Ao. In this case 
the total electrostatic energy still goes down with grow- 
ing Jzf. (Here and below we neglect the length of ss RNA 
per tail necessary to connect the tail roots: it is of the 
order of a/2 <C L. Indeed, according to Tab. [Ji ~ 15 
nm, while a/2 ~ 2.5 nm.) On the other hand, when 
Jzf > I8OA0 and each N-terminal tail gets the length Ao 
of ss RNA, the electrostatic energy saturates at low level 
and does not depend on Jzf. At this point in order to 
find optimal length of ss RNA for given tails, we should 
recall the excluded volume interaction energy, which is 
smaller than the electrostatic energy, but provides the 
growth of the free energy with Jzf at Jzf > I8OX0. In- 
deed, one should take into account that due to screening 
the persistence length of the tail-RNA complex is much 
smaller than the tail length L and the tail-RNA " arches" 
are not extended as shown in Fig. [TJd, but rather tend to 
make coils. This leads to a noticeable excluded volume 
interaction. Thus, for given tails the free energy reaches 
minimum at Jzf ~ 1&0Xq. (Similar minimum was ob- 



tained earlier for the model of protein charges uniformly 
smeared on the internal capsid surface (9(.) For the the- 
oretical charge inversion ratio 5ft we arrive at 



5ft : 



A r/ r 
Lr] t 



1 + rj r /(2r]t), when rj r < 2r]t 



Vr/Vt 



when r] r > 2r\ t 



(2) 



In Tab. U we calculated the ratio r] r /r]t for the group 
A viruses using 0.65 nm for the distance between two 
charges of ss RNA and 0.34 nm for a length of the tail 
per amino acid. We see that for the most of the viruses 
Vr/vt > 2 and, therefore, ss RNA is stretched along the 
N-terminal tails (Fig. [TJd), so that a simple way to for- 
mulate our results for the length of ss RNA is to say that 
the total length of ss RNA Jzf is equal to the total length 
of the tails 180L. Substituting values of r/r/r/t from Tab. 
U in Eq. ^ we arrived at values of 5ft listed in Tab. |U 
We see that most of them are in reasonable agreement 
with the structural data [la ]. 

This agreement may be interpreted as a result of nat- 
ural evolution of viruses in the direction of the maxi- 
mum viral stability. It is desirable, however, to design 
an in vitro experiment, which verifies our predictions. 
Before suggesting such experiment let us note that al- 
though above we discussed only packaging of a single 
ss RNA molecule in a virus, our conclusions can be ex- 
tended to the case, where many shorter ss RNA pieces 
are packaged in the virus. They just continue each other 
inside the virus and bind proteins together. Our predic- 
tions, therefore, can be verified by experiments with a 
solution of relatively short homo-polymeric ss RNA with 
the length Jzf in the range 2L < if < 180L. We sug- 
gest an equilibrium experiment with a series of solutions, 
which have a varying ratio p of the total charges of short 
ss RNA and capsid capsid proteins. At p ~ 1 in equi- 
librium all ss RNA molecules are used up in viruses, so 
that there is no free ss RNA. With growing p free ss RNA 
should appear at the critical point p = p c = 5ft, where, 
according to our theory, free ss RNA molecules and ss 
RNA molecules inside the virus are in equilibrium. Us- 
ing short ss RNA permits to vary amount of ss RNA in a 
virus almost continuously in order to find p c and compare 
it with 5ft. 

Let us now discuss the assumptions (i), (ii) and (iii) 
of our toy model, starting from the assumption (ii), that 
one can treat the N-terminal tail with a part of it sticking 
to the internal capsid surface as a free tail. The picture 
of RNA going along the one side of the tail without wrap- 
ping does not seem to be too sensitive to the fact that the 
other side of the tail sticks to the capsid. This, (together 
with the fact that in average only 24% of the tail length 
sticks to the capsid surface) makes (ii) reasonable. The 
assumption (iii) is more problematic because biological 
values of r s ~ b ~ 1 nm. They easily satisfy inequal- 
ities r s <C L, a, but do not literally satisfy assumption 
that r s ^> b. This assumption was important in order to 
say that ss RNA and N-terminal tail-RNA complex are 
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stretched and the Coulomb energy dominates the con- 
figuration entropy. We argue here that according to nu- 
merical simulations [l3| for a very flexible polyelectrolyte 
(with the bare persistence length equal to the Bjerrum 
length) even for such a small r s the Coulomb interaction 
plays a strong role: its persistence length grows three 
times already at r s = 1 nm. For less flexible polyelec- 
trolyte such as ss RNA or the tail-RNA complex the 
Coulomb interaction should play even stronger role so 
that for zero order approximation the configuration of 
the complex shown in Fig. [^3 is reasonable. The as- 
sumption (i) that ss RNA behaves as a flexible linear 
polyelectrolyte is not necessary for a homo-polymeric ss 
RNA or a generic linear polyelectrolyte used for virus self- 
assembly in- vitro On the other hand, for the viral ss 
RNA, the energy of hydrogen bonds should be optimized 
together with the electrostatic energy. It seems that ef- 
fect of such global optimization will not differ much from 
our result, but this remains to be shown. 

Up to now we have dealt with the group A. In the 
group B charges of the capsid proteins are large but 
tails are practically neutral so that the theory of Ref. [9( 
is appropriate. In the group C the charges of proteins 
and tails are very small but it is possible that for some 
viruses the internal surface of capsid proteins is positively 
charged, while the negative charges are on the external 
surface [18(. In this case, one may also redefine R as ra- 
tio of ss RNA charge to the total charge of the internal 
surface of the capsid and use Ref. to estimate R. 

In this paper we focused on T=3 viruses, because they 
attract most of physicists attention 0, 0, [13] . As we saw 
many of their capsid proteins have long positive tails. 
Capsid proteins of some T=l, 4 and 7 viruses also have 
positively charged tails. Our theory is applicable to them 
as well. Detailed analysis of these classes is beyond scope 
of this paper. 

In conclusion, the data 0, [1| show that there is a big 
group of viruses, where practically all positive charges of 
a capsid protein are concentrated in a long and flexible N- 
terminal tail. For a given length and charge of the tail we 
optimized the length of the ss RNA genome by searching 
for minimum of free energy of the virus. We arrived at 
the very simple result that a virus is most stable when 
the total length of ss RNA is close to the total length of 
the tails. This result is in reasonable agreement with the 
viral structural data 0, @| . This may be interpreted as a 
result of evolution in the direction of viral stability. 
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